Nature Machine Intelligence — Latest Matching Preprints

1

GeoEPred: A Multimodal Structure-Aware Geometric Deep Learning Framework for Gram-Negative Bacterial Secreted Effector Prediction with Sequence Semantics

Song, S.; Shi, H.; Wu, H.; Liu, D.; Lin, Y.; Mat Isa, N. A.; Zou, Q.; Wei, L.

2026-05-20 genomics 10.64898/2026.05.18.725929 medRxiv

Top 0.1%

23.1%

Show abstract

Accurate prediction of effector proteins secreted by Gram-negative bacteria is important for elucidating bacterial pathogenic mechanisms and developing precise anti-infective strategies. Although existing methods have benefited from the strong sequence feature extraction capacity of pretrained protein language models, reliance on linear sequence information alone often fails to fully capture the three-dimensional conformational signals required for virulence functions. Meanwhile, conventional structure-based methods are limited by the scarcity of experimentally resolved protein structures. To address these challenges, We propose GeoEPred, a multimodal deep learning framework designed for the synergistic modeling of protein sequence and structure to identify Gram-negative bacterial effector proteins. Specifically, the model integrates sequence-contextual embeddings from a pretrained protein language model with three-dimensional structural representations predicted by ESMFold. A feature projection network refines fine-grained sequence signals associated with effector functions, while geometric vector perceptrons characterize inter-residue orientations, distances, and local spatial topology to capture potential structural conformational motifs. To further enable effective cross-modal fusion, we design a cross-modal alignment and feature-tokenized self-attention module. This module enhances consistency between the sequence-semantic and structural-geometric spaces through contrastive learning and models associations between linear functional motifs and spatial conformational patterns at a fine-grained token level. Extensive evaluations on multiple benchmark datasets show that GeoEPred achieves better predictive performance than existing leading models in T3SE, T4SE, and T6SE prediction tasks, while maintaining stable performance in remote homolog recognition scenarios. Moreover, the modular and extensible architecture of GeoEPred demonstrates strong generalization ability and substantial application potential for genome-scale effector protein discovery. Author summarySecreted effector proteins are central virulence factors used by many Gram-negative bacterial pathogens to execute infection strategies. Their functions are governed not only by secretion signals and short linear motifs in the amino acid sequence, but also by three-dimensional folds, local domains, and surface geometric patterns. However, current predictors mainly exploit sequence-contextual features, limiting their ability to model the correspondence between linear sequence signals and spatial conformational motifs, and thereby constraining accuracy and interpretability. Here, we present GeoEPred, a multimodal deep learning framework for secreted effector protein identification. GeoEPred couples sequence-semantic embeddings from a pretrained protein language model with structural representations learned by geometric vector perceptrons. A cross-modal alignment and interaction module uses contrastive learning to improve functional consistency between sequence and structure modalities, while feature-token attention captures fine-grained links between key linear and conformational motifs. Across benchmark datasets covering multiple effector types, GeoEPred outperforms existing state-of-the-art methods and provides interpretable evidence from sequence fragments, structural regions, and cross-modal associations, supporting functional annotation, pathogenic mechanism analysis, and experimental validation.

2

Fully Homomorphic Collaborative Learning for Safe Cross-Healthcare Institution Development and Implementation of Foundation Models

Bian, S.; Qiao, H.; Yan, T.; Xia, Z.; Gao, X.; Xu, Y.; Shen, R.; Ma, T.; Guan, Z.; Wang, Y. X.; Wong, T. Y.; Dai, Q.

2026-05-20 ophthalmology 10.64898/2026.05.15.26353345 medRxiv

Top 0.1%

22.6%

Show abstract

Foundation models (FMs) are powerful tools to allow the broad clinical application of artificial intelligence (AI) in healthcare systems, offering adaptability to different disease, modalities and clinical settings. However, FMs require large-scale datasets to train and fine-tune, while most real-world data are localized in siloed healthcare settings with strict data privacy protection, a restriction that poses a fundamental challenge in the cross-healthcare institution development of FMs. Here, we develop a fully homomorphic collaborative learning framework, named as FOCAL, that enables secure FM-driven diagnosis without exposing raw patient information. Different from traditional federated learning (FL) frameworks that aggregate locally trained models, FOCAL integrates fully homomorphic encryption (FHE) with split training to effectively execute collaborative learning completely over encrypted data. Specifically, we apply FOCAL on different types of retinal and pathology FMs to demonstrate its clinical performance. When facing gradient inversion attacks, FOCAL reduced the data leakage rate from 90.6% to 0% with comparable accuracy performance of the state-of-the-art FL paradigms, owing to the provable security provided by FHE. Moreover, under the same level of security, FOCAL can boost the macro-average AUROC by nearly 50% (from 0.5202 to 0.9831) when evaluated against fully encrypted FL models. In the multi-institution comparative experiments, FOCAL consistently outperforms all single-institution FMs, improving AUROCs by 9.62% and 14.46% on the ocular disease diagnosis and severity classification, respectively. Lastly, external validations on both retinal and pathology FMs further verified the accuracy and security advantages of FOCAL and highlighted its reliable interpretability and generalizability for cross-institution clinical development and implementation of FMs. FOCAL is a novel method to build a secure data-sharing AI community, facilitating healthcare institutions to benefit from and contribute to next-generation FMs development without compromising patient privacy and data security.

3

Learning the Language of the Microbiome with Transformers

Treloar, N. J.; Ur-Rehman, S.; Yang, J.

2026-05-06 bioinformatics 10.64898/2026.05.02.722381 medRxiv

Top 0.1%

18.2%

Show abstract

Self-supervised pretraining has become central to biological machine learning, yet microbiome data remains comparatively underexplored in terms of both modeling approaches and evaluation frameworks. To address this gap, we present Atlas, a pretraining dataset of over 539,000 microbiome datapoints from the MGnify database. Using Atlas, we train the Waypoint family of microbiome foundation models: a series of GPT-2 style causal language models ranging from 6M to 170M parameters. We also introduce Compass, a curated benchmark of eight predictive tasks spanning biome classification, drug-microbiome interactions, drug degradation, and infant gut development. Using this benchmark, we compare the performance of Waypoint models against classical baselines and the existing MGM foundation model. Our results show that pretraining leads to consistent and significant improvements in downstream task performance, that both dataset scale and tokenization strategy impact model quality, and that pretraining is essential for achieving favorable scaling behavior. Furthermore, pretrained transformer models begin to reliably outperform classical methods once training data exceeds roughly 10,000 examples - a threshold that is attainable for modern microbiome studies. Finally, we demonstrate that the Waypoint models achieve state-of-the-art performance among microbiome foundation models. Overall, our work highlights the importance of large-scale self-supervised pretraining in this domain and establishes Atlas, Compass, and the Waypoint models as valuable resources for the research community in this emerging field.

4

Receptor-Anchored Olfaction Representation through Perception-Consistent Metric Learning

Tian, C.; Wang, J.; Hou, J.; Liu, W.; Luo, Y.; Wang, Y.; Yang, L.; Lin, W.

2026-05-12 bioinformatics 10.64898/2026.05.08.723701 medRxiv

Top 0.1%

14.8%

Show abstract

Olfactory perception arises from distributed activation across hundreds of olfactory receptors (ORs), yet our understanding of this landscape remains constrained by the scarcity of OR affinity measurements. Here, we present Receptor-Anchored Metric Supervision (RAMS), a transfer learning framework using perceptual consistency as weak supervision to predict OR activation spectra. RAMS fine-tunes a pretrained drug-target affinity model by imposing constraints derived from olfactory perception, where similar odorants are encouraged to exhibit similar OR activations. It transfers protein-ligand interaction knowledge learned from large-scale pharmacological data into the olfactory domain and reshapes it toward OR activation prediction. Evaluations against experimental measurements show that RAMS improves the accuracy of receptor-spectrum prediction and yields biologically plausible activation patterns. The predicted spectra show concordance between receptor discriminative capacity and expression level, and highlight the understudied OR52 family as a potential contributor to primary odor recognition. Together, RAMS provides a scalable framework for reconstructing receptor-anchored olfactory representations.

5

OmniGene-4: A Unified Bio-Language MoE Model with Router-Level Interpretability

Wang, L.

2026-05-14 bioinformatics 10.64898/2026.05.12.724542 medRxiv

Top 0.1%

14.2%

Show abstract

We introduce OmniGene-4, a unified bio-language foundation model built on Gemma-4-26B-A4B (30 layers, 128 experts per layer, top-8 routing). We inject 28,028 biological tokens (DNA and protein BPE, Foldseek 3Di, DSSP labels), continue pretraining on a 32.5 GB DNA / protein / natural-language / structural mixture, and run a five-stage supervised fine-tuning pipeline (v2-v5) on 199,576 instruction-format examples across eight task families. The final v5 adds a dual-head architecture: a generation head plus two per-residue classification heads (3Di, DSSP) trained jointly under a 0.5/0.5 loss split. v5 reaches 99.40% accuracy on BioPAWS standard protein homology, 82.60% on remote homology (500 pairs), and 93.66% on BixBench -- gaining +14.4, +22.6, +6.7 percentage points over the vocabulary-extended Gemma-4-Instruct baseline, and outperforming ESM-2 (650M) by +32.1 pp on the identical remote-homology split. The classification heads reach 78.6% per-residue accuracy on 3Di (chance 5%) and 100% on DSSP (chance 12.5%). MoE router activations further yield a clean CPT/SFT 96%/4% decomposition of cross-task differentiation, providing direct interpretability of where biological specialization is acquired.

6

A community machine learning challenge to predict the effects of gene perturbations on T cell differentiation for cancer immunotherapy

Zhang, J.; Schwartz, M. A.; Mutaher, M.; Olajide, O.; Pritykin, Y.; Ashenberg, O.; Hacohen, N.; Uhler, C.

2026-05-22 bioinformatics 10.64898/2026.05.21.726863 medRxiv

Top 0.1%

12.8%

Show abstract

Perturbations of genes with functional importance in T cells could be used to change the distribution of CD8 T cell states to enhance anti-tumor functions for cancer immunotherapies. We launched a world-wide computational challenge to predict the effects of gene perturbations and to devise objective functions for prioritizing gene perturbations that lead to desired T-cell state distributions. We supported the challenge by generating a single-cell Perturb-seq dataset profiling the effect of knocking out 73 individual expert-defined genes in T cells transferred into a mouse melanoma model. We compared the top algorithms developed by participants, and found that performance was primarily determined by the prior data used for gene feature representation, with perturbational data derived features, proving most effective. Experimental validation of the top 61 genes nominated by the algorithms revealed that perturbation of Ndufv2 and Dimt1 reached the defined objective and biased T cell differentiation toward desired states.

7

Towards A Foundation Model for Clinical Voice Biomarkers

Elemento, O.; Sigaras, A.; Colonel, J.; Hajirasouliha, I.; Ghosh, S.; Bensoussan, Y.; Bridge2AI-Voice Consortium, ; Rameau, A.

2026-05-30 health informatics 10.64898/2026.05.28.26354346 medRxiv

Top 0.1%

12.4%

Show abstract

Vocal biomarkers, encompassing voice and speech, have largely been developed for individual conditions in isolation, limiting their generalizability across diseases and recording settings. To address this, we introduce VoiceFM, a contrastive model that learns general-purpose clinical voice representations by aligning audio embeddings with rich clinical metadata. Using the Bridge2AI-Voice dataset (984 primarily English-speaking adult participants, 846 used for training and 138 held out as a temporally separated validation cohort, 40,056 recordings totaling 176 hours across 5 academic medical centers), VoiceFM pairs a fine-tuned Whisper large-v2 encoder with a tabular transformer over 44 clinical features via symmetric InfoNCE loss. Linear probes on frozen VoiceFM embeddings achieve mean AUROC 0.952 +/- 0.005 across five evaluation tasks (control vs disease screening plus four disease categories), significantly outperforming Frozen Whisper (0.926 +/- 0.013, p = 0.013), Frozen HuBERT (0.885 +/- 0.017, p = 0.0009), and the contrastively trained VoiceFM-HuBERT (0.938 +/- 0.006, p = 0.012). On the 138-participant held-out cohort, VoiceFM-Whisper achieves AUROCs of 0.99 for Alzheimer's/dementia/MCI and 0.89 for airway stenosis, demonstrating that the learned representations generalize to participants the model has never seen. VoiceFM representations transfer to three external datasets without retraining and improve few-shot classification. Recording task attribution identifies a small set of speech tasks that match or exceed the full battery's performance, suggesting shorter screening protocols are feasible. Trained predominantly on English audio, VoiceFM transfers without fine-tuning to Spanish-language Parkinson's disease (PD) detection (NeuroVoz, 107 participants, AUROC 0.93 +/- 0.02), with the signal dominated by articulatory rather than phonatory features. A fine-tuned classifier achieves participant-level AUROC 0.87 (sustained 0.85, countdown 0.80) on the mPower smartphone study (585 held-out participants). Together, these results show that contrastive alignment between voice and rich clinical metadata can serve as the basis for a clinical voice foundation model, producing a single set of transferable representations that generalize across diseases, languages, recording conditions, and patients enrolled after model freeze.

8

Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

Gupta, A.; Buendia, A.; Kundaje, A.; Leskovec, J.

2026-05-15 genomics 10.64898/2026.05.12.724740 medRxiv

Top 0.1%

12.3%

Show abstract

Developing scientific hypotheses in biology requires integrating heterogeneous evidence across DNA sequence, gene context, protein function, and prior literature. Existing multimodal AI systems expose biological evidence to reasoning models through textification or by projecting biological embeddings into fine-tuned language models. However, these models are typically highly optimized the specific set of tasks for which they are fine-tuned. Here we present Bio-BLIP, a multimodal Q-former based architecture which leverages biological embeddings and a LLM to generalize to complex reasoning tasks without task-specific fine-tuning. The key to Bio-BLIP is a new neural network architecture that integrates four data modalities - DNA, genes, proteins, and text - through a master Qformer model, which integrates the modality-specific information into a fixed-length prefix for the LLM backbone. Bio-BLIP is pretrained on the task of human genetic variant annotation and achieves a 29.8% increase in generating accurate variant features over frontier LLMs. We evaluate Bio-BLIP zero-shot on downstream genomic tasks of variant prioritization and target gene prediction. Bio-BLIP outperforms two alignment-free genomic language models on regulatory variant prioritization for Mendelian disease. Across the target gene prediction task, Bio-BLIP improves accuracy over LLMs by leveraging learned genomic variant knowledge in difficult cases. Our model produces rich, transparent reasoning traces. In biological domains characterized by multiple scales of data and varied downstream tasks, Bio-BLIP offers a step toward natively multimodal, generalizable reasoning.

9

Widespread use of invalid statistical tests in biomedical machine learning

Zeng, T.; Li, H.; Zhang, S.; Tan, Y. Q.; Tian, F.; Orban, C.; An, L.; Che, W.; Cheng, J.; Chong, J. S. X.; Dehestani, N.; Dong, Z.; Li, X.; Li, Z.; Lim, M. J. R.; Lin, Y.; Ling, Q.; Ling, Z.; Low, X. Z.; Mansour L., S.; Ng, K. K.; Nguyen, T. T.; Ooi, L. Q. R.; Pande, S.; Qian, X.; Ruan, J.; Wang, Z.; Xie, Y.; Zhang, C.; Zhang, Y.; Patil, K.; Parkes, L.; Dhamala, E.; Chopra, S.; Zalesky, A.; Holmes, A.; Eickhoff, S.; Zhou, J. H.; Renaud, O.; Dosenbach, N.; Kording, K. P.; Bzdok, D.; Nichols, T.; Yeo, B. T. T.

2026-05-20 bioinformatics 10.64898/2026.05.17.724301 medRxiv

Top 0.1%

12.3%

Show abstract

Machine learning is accelerating biomedical research. Cross-validation is widely used to compare predictive performance - not only to benchmark algorithms, but also to inform scientific applications, such as ranking biomarkers. However, prediction performance estimates across cross-validation folds are not independent. Standard tests for comparing prediction performance (e.g., paired t-test) assume independence and can therefore inflate false positive rates. In a PRISMA-guided meta-analysis of 210 studies (impact factor [≥]15, 1 June 2020 - 1 June 2025), we find that 97% ignored fold dependence when comparing prediction performance. This problem is ubiquitous across scientific fields and unaffected by impact factor, rigor-promoting policies, or open science practices. Simulations across 420 scenarios spanning four diverse datasets show that ignoring fold dependence leads to invalid false positive control in most settings. Repeated cross-validation further compounds this problem, with false positive rates rising toward 100% as the number of repetitions grows. Existing fold-dependence-aware tests rely on strong assumptions because the variance of fold-level statistics and the between-fold correlation cannot be disentangled under standard cross-validation. We therefore propose the SHARP (Split-HAlf RePeated) test, a simple modification to standard cross-validation that enables direct estimation of variance and correlation. Benchmarked against 12 tests, SHARP provides the best overall balance of false-positive control, statistical power, and confidence-interval calibration across simulation schemes. We conclude by providing best practices and reporting guidelines for valid model comparison inference in biomedical machine learning and beyond.

10

LNGCN: A Distance-Aware Dynamics Network for Protein-Protein Interaction Prediction

Xiao, Y.; Zheng, Y.; Hua, Y.; Peng, J.; Liu, J.; Qu, Y.; Xu, J.; Fu, R.; Qian, Q.; Zhao, M.; Zhang, X.; Zhao, J.; Yao, Y.; Kosar, M.; Ke, Y.; Chi, Y.

2026-05-04 bioinformatics 10.64898/2026.04.30.721835 medRxiv

Top 0.1%

12.3%

Show abstract

High-throughput accurate protein-protein interaction (PPI) prediction is foundational to systems-level biological understanding, disease mechanism dissection, and structure-based drug discovery. Traditional graph convolutional networks (GCNs) are limited by discrete information propagation, layer-wise representation homogenization, and absent continuous-time state evolution, failing to capture residues 3D spatial hierarchical dynamic binding patterns. We present LNGCN, a hybrid framework integrating liquid neural networks with GCNs, which encodes residue radial distances as node-level driving terms for continuous updates with hierarchical probabilistic calibration. On standard benchmarks, LNGCN achieves 90% relative AUPRC improvement over PIPR, outperforms RF2-PPI on 1 : 10 imbalanced datasets, and retains 0.9324 AUPRC on held-out yeast test data. LNGCN further demonstrates biological utility in phosphorylation-dependent SHP2 signaling, FGF23-FGFR1c--Klotho ternary assembly, Tdk1 oligomeric-state-dependent interactions, and experimentally validated TPR-mediated candidates. By capturing state-dependent interaction changes, LNGCN provides a scalable framework for PPI screening, candidate prioritization, and future residue-level dynamic PPI trajectory modeling.

11

Deep learning models for chemical perturbation prediction do not yet utilise drug molecular features

Bai, J.; Prince, S.; Nitschke, G. S.

2026-05-15 bioinformatics 10.64898/2026.05.13.724458 medRxiv

Top 0.1%

10.8%

Show abstract

Recent deep learning models for L1000 chemical perturbation prediction incorporate dedicated drug molecular encoders. We retrained seven such models from scratch with zeroed or shuffled drug inputs, and compared them with a multilayer perceptron that uses only cell-line basal expression. Under drug-blind evaluation, ablation caused negligible performance changes and the drug-free baseline matched all models. Current architectures do not yet utilise drug molecular features for generalisation to unseen compounds.

12

BRIDGE-GRN: Role-Aware Bi-Tower Graph Learning with Cross-View Contrast for Directed Gene Regulatory Network Inference

Chen, H.; Ding, W.

2026-05-14 bioinformatics 10.64898/2026.05.12.724562 medRxiv

Top 0.2%

10.1%

Show abstract

Inferring directed gene regulatory networks (GRNs) from single-cell RNA sequencing (scRNA-seq) data remains difficult because expression profiles are sparse, regulatory priors are incomplete, and experimentally supported TF-target labels are limited. To address these challenges, we propose BRIDGE-GRN, a role-aware graph learning framework that separates shared graph-context encoding from directional edge decoding. BRIDGE-GRN constructs an undirected support graph from training positive regulatory evidence, learns shared gene representations with an attention-based graph encoder, and projects them into transcription factor-role and target-role embedding spaces for asymmetric TF-to-target scoring. To improve robustness under noisy and incomplete supervision, the model aligns identity and edge-perturbed graph views through cross-view contrastive regularization. We evaluated BRIDGE-GRN across mouse benchmark settings spanning five cell types, three prior-network families, and two gene-scale settings, and further examined low-supervision transfer to target domains, architectural ablations, and biological interpretability. BRIDGE-GRN achieved consistently strong performance, outperforming or matching the strongest competing baseline in most benchmark configurations. Transfer initialization improved low-shot target-domain adaptation, while ablation analyses confirmed the importance of both role-specific bi-tower projections and contrastive regularization. Biological interpretation analyses further showed role-structured embeddings, enrichment of top-ranked predictions for external regulatory support, and coherent driver-centered regulatory modules. These results support BRIDGE-GRN as a robust, transferable, and interpretable framework for directed GRN inference from single-cell transcriptomic data.

13

A PK-Driven Quantitative Systems Pharmacology Model Predicts Cytokine Release Syndrome Severity Across T Cell-Activating Therapies via a Locked Amplification Network

besbassi, h.

2026-05-08 pharmacology and toxicology 10.64898/2026.05.05.722920 medRxiv

Top 0.2%

10.1%

Show abstract

Cytokine release syndrome (CRS) is a major dose-limiting toxicity of T cell-engaging immunotherapies. Existing CRS models are drug-class-specific and have not addressed whether a single mechanistic cytokine network can capture severity differences across mechanistically distinct drug classes. Here, we developed a PK-driven quantitative systems pharmacology (QSP) model linking drug exposure, T cell activation dynamics, and a macrophage-amplified cytokine network to clinical CRS severity. The 17-parameter downstream amplification network with macrophage-gated STAT3 positive feedback was developed iteratively. The network was calibrated on blinatumomab, structurally refined using TGN1412 as a transparently disclosed development case, then locked and tested blind on OKT3. The same locked network was used to evaluate cross-drug transferability across three antibody-based T cell engager classes: bispecific, CD28 superagonist, and anti-CD3 with activation-induced cell death. The locked network reproduced the clinically observed CRS severity ordering across all three drugs without re-fitting any shared parameter. The OKT3 blind prediction passed eight qualitative plausibility checks and three of three quantitative cytokine peaks within published clinical ranges. Tocilizumab rescue simulation reproduced five clinically validated phenomena. A mechanistic parameter swap test reversing the T cell exhaustion rate between OKT3 and TGN1412 reversed CRS severity in the expected direction, supporting a mechanistic rather than parameter-fitted interpretation. Local robustness analysis (ABC-style accepted ensemble: 692 of 5,000 parameter sets accepted, 13.8%) and a 2D stability map over the two threshold-setting parameters (0 of 900 wrong-order combinations) confirmed that the cross-drug severity ordering is a property of a feasible parameter region rather than a single tuned point. Profile likelihood analysis of the IL-6 feedback and clearance rates revealed complementary asymmetric profiles consistent with practical identifiability as a ratio. The same locked model predicted three qualitatively distinct dose-response shapes without re-fitting. Findings should be interpreted as a mechanistic proof-of-concept; prospective clinical validation remains pending.

14

Failure detection in medical image classification under realistic distribution shifts: A large-scale benchmark

Steinmetz, P.; Frouin, F.; Morard, V.; Buvat, I.

2026-05-05 radiology and imaging 10.64898/2026.05.04.26350496 medRxiv

Top 0.2%

10.1%

Show abstract

Medical images (MI) exhibit variability due to different acquisition protocols, devices, and patient populations, making failure detection at inference time essential for reliable deployment of clinical classifiers. As existing evaluations of failure detection methods use different settings, it is difficult to compare results and identify the best strategy, if any. We present a comprehensive benchmark of eight confidence scoring functions and two score-aggregation strategies across eight MI tasks spanning diverse modalities, backbone architectures, training setups, and failure sources. The confidence ranking ability and classification error mitigation are jointly evaluated. While no single method systematically dominated across settings, aggregation of confidence scores consistently matched or approached the best individual method and substantially reduced silent failure rate. The failure detection performance was strongly correlated with classifier accuracy for all tested settings. These findings provide large-scale evidence regarding the strengths and limitations of confidence scoring strategies and offer actionable guidance for mitigating silent failures under realistic distribution shifts in MI.

15

CardioSafe: Multi-task prediction of cardiac ion channel activity with reverse-leak audited benchmarking

Jovanovic, M.; Weidener, L. S.; Brkic, M.; Ulgac, E.; Meduri, A.

2026-05-12 bioinformatics 10.64898/2026.05.06.723181 medRxiv

Top 0.2%

10.0%

Show abstract

Drug-induced inhibition of the hERG potassium channel is the leading cause of cardiac safety-related drug attrition, but the Comprehensive in Vitro Proarrhythmia Assay (CiPA) framework requires activity data on multiple cardiac ion channels to assess proarrhythmic risk. We present CardioSafe, a three-branch multi-task neural network with cross-attention fusion that integrates chemical fingerprints, ChemBERTa embeddings, and predicted L1000 transcriptomic features to predict blocker status and potency for hERG, Nav1.5, and Cav1.2, with an exploratory IKs head. CardioSafe was trained on the largest publicly reported multi-channel cardiac ion channel dataset, combining ChEMBL 36 with the hERGCentral database (331127 hERG, 3160 Nav1.5, 1138 Cav1.2, and 115 IKs compounds), curated under a pharmacology-aware policy that retains censored measurements and inhibition-percentage votes. Under Tanimoto-similarity-controlled splits, CardioSafe outperforms the leading published comparators (CToxPred2 and CardioGenAI) on the data-rich hERG head; on the smaller Nav1.5 and Cav1.2 heads the standard evaluation is statistically inconclusive. A reverse-leak audit revealed that 22% of Nav1.5 and 21% of Cav1.2 test compounds were present in published comparators training data (92% as exact compound matches); after removing these contaminated compounds, CardioSafes lead on Nav1.5 and Cav1.2 also reaches statistical significance, demonstrating that prior cross-publication benchmarks for these channels were inflated by training-data overlap. Scientific contributionWe present the first multi-task neural network jointly predicting blocker activity for the three primary CiPA cardiac ion channels (hERG, Nav1.5, Cav1.2) within a single architecture. We introduce a reverse-leak audit methodology that reveals systematic test-set contamination in cross-publication cardiac safety benchmarks, establishing a stricter evaluation protocol. We provide the empirical test of predicted L1000 transcriptomic features as auxiliary input for cardiac ion channel prediction and document a well-characterized negative result. Graphical abstractCardioSafe encodes each query SMILES with three branches (chemical fingerprints + descriptors, pretrained ChemBERTa, and predicted L1000 transcriptomic signatures), fuses them via a cross-attention block with four learnable per-channel query tokens, and emits binary blocker calls plus pChEMBL regression for hERG, Nav1.5, Cav1.2, and (exploratory) IKs. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=59 SRC="FIGDIR/small/723181v1_ufig1.gif" ALT="Figure 1"> View larger version (13K): org.highwire.dtl.DTLVardef@1c0ba2aorg.highwire.dtl.DTLVardef@1fe3a0borg.highwire.dtl.DTLVardef@194de8aorg.highwire.dtl.DTLVardef@9e4f74_HPS_FORMAT_FIGEXP M_FIG C_FIG

16

HiCP2GAN: A Plug and Play Foundation Model-based GAN for Hi-C Enhancement

Olowofila, S.; Oluwadare, O.

2026-05-20 bioinformatics 10.64898/2026.05.18.725960 medRxiv

Top 0.2%

9.0%

Show abstract

The three-dimensional organization of chromatin shapes gene regulation and cellular function. Hi-C has emerged as the primary technique for mapping chromatin interactions genome-wide, yet high-resolution data remain costly and scarce, leaving many studies with sparse contact maps that limit downstream analysis. Deep learning methods, especially generative adversarial networks (GANs), have shown promise for enhancing low-resolution Hi-C data. Most existing GAN-based approaches, however, rely on custom discriminators trained from scratch, which can yield unstable training and limited generalization. Hi-C foundation models pretrained on large-scale data capture rich, transferable representations of chromatin structure; their use as discriminators within adversarial enhancement frameworks has not been explored. In this work, we introduce HiCP2GAN, a plug-and-play GAN that employs a pretrained Vision Transformer-based Hi-C foundation model as its discriminator. The discriminator was pretrained on 118 million Hi-C patches across diverse species and cell types, providing biologically meaningful gradients for adversarial supervision. The HiCP2GAN framework is generator-agnostic: any compatible Hi-C resolution enhancement architecture can serve as the generator, enabling fair comparison across methods. The encoder phase of the foundation model was adapted as a discriminator backbone and experimented with finetuning different numbers of layers from the input while freezing the deeper transformer layers. Finetuning the first few layers while freezing the rest preserved pretrained knowledge while allowing task-specific adaptation. Experiments on human cell lines show that HiCP2GAN consistently improves resolution over standalone generators and conventional GAN-based models, while serving as a plug-and-play framework for most non-GAN generator models. HiCP2GAN is publicly available at https://github.com/OluwadareLab/HiCP2GAN.

17

BiLSTM-Powered Bilinear Attention for Protein-Ligand Prediction

Cheng, C.-Y.; Chen, Y.-A.; Li, F.-Y.; Re, S.

2026-05-13 bioinformatics 10.64898/2026.05.10.724184 medRxiv

Top 0.2%

8.6%

Show abstract

Rapid and accurate prediction of protein-ligand bindings is essential for drug discovery. While generative AI has driven rapid advancements in structure-based approaches, sequence-based methods remain significantly faster and more cost-effective. Here, we present a weakly supervised deep learning framework integrating graph convolutional networks (GCN) for molecular encoding and bidirectional long short-term memory (BiLSTM) for protein modeling. The latter represents long-range dependencies better than the widely used convolutional neural network (CNN). Leveraging a bilinear attention network (BAN), this model learns protein-ligand pairwise interactions without requiring three-dimensional structural supervision. By using the publicly available BindingDB dataset, the model was trained, solely on affinity labels, and successfully classified binder and non-binders with AUROC of 0.96 and an AUPRC of 0.95. The model generates interpretable attention maps that serve as a "GPS" to locate binding sites. Remarkably, despite the lack of structural training data, it can pinpoint key contact residues confirmed by crystal structures. Our method could function as a scalable filter for giga-scale libraries, allowing rapid screening of drug candidates with direct structural insights into the protein-ligand interface.

18

Tsallis-Gated Autoencoder: A Nonextensive Physics-Informed Approach for Unsupervised Anomaly Detection in Glioblastoma Multiforme RNA-seq Data

Assuncao Monteiro, S.; Alves Barbosa da Silva, F.

2026-05-15 bioinformatics 10.64898/2026.05.13.724767 medRxiv

Top 0.2%

8.6%

Show abstract

Glioblastoma multiforme (GBM) is characterised by profound genomic heterogeneity and heavy-tailed gene-expression distributions that challenge conventional machine-learning methods. We introduce the Tsallis-Gated Autoencoder (Tsallis-GAE), a physics-informed architecture that replaces classical softmax attention with a learnable Tsallis q-softmax followed by mean-field smoothing iterations, motivated by recent work on curved statistical manifolds and dense associative networks. Trained on the full TCGA-GBM RNA-seq cohort (391 samples, top 2,000 high-variance genes) under a rigorous 80/20 hold-out protocol, the Tsallis-GAE achieves a mean AUC-ROC of 0.977 {+/-} 0.002 across five independent seeds, compared to 0.906 {+/-} 0.003 for a matched-capacity Vanilla autoencoder trained under the identical protocol. The matched-capacity Vanilla autoencoder is statistically indistinguishable from a LocalOutlierFactor baseline (AUC 0.906 vs 0.906), confirming that the +0.07 AUC gain over the Vanilla AE stems from the gated attention architecture rather than from the use of a neural network per se. A fixed-q Softmax-AE ablation (q {equiv}1 by construction) achieves AUC 0.976 {+/-}0.001, only +0.001 below the Tsallis-GAE (DeLong p = 0.44); the physically meaningful contribution of the learnable q is its spontaneous convergence to the non-extensive regime described below. The three attention blocks each carry an independent learnable entropic index q; across 5 seeds x3 blocks = 15 measurements, q converges spontaneously to 1.554{+/-} 0.019, strictly bounded away from the Boltzmann-Gibbs limit q = 1 and in the moderate non-extensivity regime characteristic of complex biological systems. Cross-detector validation against OneClassSVM and LocalOutlier-Factor pseudo-labels yields Tsallis-GAE AUCs of 0.998 and 0.992 respectively, indicating that the learned representation captures anomaly structure intrinsic to the data rather than the decision boundary of any single labeling heuristic. We declare that DeLongs paired test on the present test-set size (n = 79) does not certify the +0.07 AUC gap as formally significant (p{approx} 0.26); a 5-fold cross-validation over the full cohort, which would supply the needed statistical power, is left to future work. The source code is available upon reasonable request to the corresponding author.

19

From naive to foundation: benchmarking models for epidemic forecasting

Wang, D.; Li, Y.; Perra, N.

2026-05-13 epidemiology 10.64898/2026.05.11.26352889 medRxiv

Top 0.2%

8.6%

Show abstract

We systematically evaluate and compare the performance of classical statistical methods (ARIMA), mechanistic compartmental models (SEIR), modern deep learning architectures (LSTM, DLinear, Autoformer), and an emerging time-series foundation model (TabPFN-TS) to forecasts the incidence of Influenza-Like Illness (ILI) across nine European countries. The models are benchmarked against a naive baseline and a multi-model ensemble (RespiCast) created by an initiative of the ECDC. In line with the operational practice of existing forecasting hubs, our entire evaluation is explicitly optimized for short-term horizons (1 to 4 weeks ahead). Interestingly, we found that the foundation model TabPFN-TS allows for great zero-shot inference capabilities. Without any task-specific retraining, it successfully overcomes extreme data scarcity to consistently outperform all other individual architectures, frequently rivalling or surpassing the RespiCast ensemble. Our results highlight how deep learning architectures are severely constrained by extreme data scarcity, typical in epidemic forecasting, requiring targeted endogenous data augmentation to reduce predictive errors. Within the deep learning class of models, we observe that simpler architectures (such as DLinear and LSTM) frequently exhibit greater robustness and outperform complex, attention-based models (such as Autoformer) when data is constrained. Finally, our results show how a weighted ensemble, constructed by fusing all the models, delivers highly robust forecasts in all regions considered. Overall, our findings showcase the transformative potential of zero-shot foundation models in epidemic forecasting and confirm the importance of multi-model ensembles.

20

eSkip2 prioritizes exon-skipping antisense oligonucleotide target regions across exon--intron contexts

Chiba, S.; Kunitake, K.; Shirakaki, S.; Haque, U. S.; Wilton-Clark, H.; Shah, M. N. A.; Leckie, J. N.; Matsui, K.; Uno-Ono, F.; Yokota, T.; Aoki, Y.; Okuno, Y.

2026-05-11 bioinformatics 10.64898/2026.05.05.722571 medRxiv

Top 0.2%

8.4%

Show abstract

Antisense oligonucleotides (ASOs) for exon skipping are increasingly used to correct pathogenic splicing; however, rational target-region selection remains difficult because regulatory information is distributed across exons, introns, and splice junctions. Here we present eSkip2, a framework for prioritizing exon-skipping ASO target regions from joint exon-intron sequence context. eSkip2 combines transfer learning from a genome-pretrained foundation model with joint training on ASO activity and SNV-derived splicing perturbation data and can be adapted to a target locus without experimental ASO labels. Across multi-gene benchmarks spanning canonical exons, pseudoexons, cell types, chemistries, and exonic, intronic, and exon-intron-spanning targets, eSkip2 robustly prioritized active regions; in exon-confined comparisons, it showed improved overall performance compared with applicable existing models. It also supported prospective design of dual-targeting ASOs for DMD exon 46, where top-ranked candidates were enriched for active ASOs and yielded dose-dependent dystrophin restoration. eSkip2 narrows the experimental search space across diverse target architectures.